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Abstract 

. Recently the quantum Bayesian prediction problem was formulated by Tanaka and Komaki 

(2005). It is shown that Bayesian predictive density operators are the best predictive density 
pj^ operators when we evaluate them by using the averaged quantum relative entropy based on a prior 

distribution. In the present paper, we adopt the quantum a-divergence as a wider class of loss 
function. The generalized Bayesian predictive density operator is defined and shown to be best 
among all the estimates of the unknown density operator. 
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I. INTRODUCTION 



In classical statistics, the problem of predicting an unobserved variable y by using an 
observed variable x has been investigated. Suppose that a parametric model 

V = {p(y\e) :9eQ}, 

which is a set of probability densities, is given, where B is a parameter space. Random 
variables x and y are distributed according to the same true probability density p(-\9) in V. 
We predict the unobserved variable y with a predictive density p(y; x) constructed by using 
the observed variable x. The closeness of the true density p{y\9) and a predicted density 
p(y; x) is evaluated by using the Kullback-Leibler divergence 

D(p\\p) := Jp(y\6) log 

Aitchison |l| showed that a Bayesian predictive density 

Mv\x) ■= [ p(y\o)n(9\x)de, (i) 

Je 

where n(9\x) is a posterior distribution, is the best predictive density when we eval- 
uate a predictive density p(y; x) by using the average Kullback-Leibler divergence 
J tt(9) J D(p\\p)p(x\9)dxd9, where tc(9) is a probability density. This result was extended 
to the quantum setting by Tanaka and Komaki 3]. 

Let us consider this result more deeply. From an observation x, we obtain the corrected 
information ir(9\x) on the unknown parameter 9 and Eq.(^) is obtained by taking mixture of 
possible probability densities p{y\9) with respect to tt(8\x). However, there are many ways of 
taking mixture. For example, let p±(x) and P2{x) denote two possible Gaussian distributions. 
Then, p(x) = \pi(x) + \p2(x) is one possibility. Another possibility of mixture is given by 
\ogp'(x) = \ \ogp\{x) + | \ogp 2 (x). The advantage of the latter is that the mixture itself is 
again a Gaussian distribution. In classical statistics, such a mixture is often useful and a 
general class of mixture, an a-mixture 3, can be defined by 



P ( "\y\x) := {f{p(y\e)}^7r(9\x)de } . (2) 
pi +l) (y\x) := ex P {Jlog(p(y\9))n(9\x)d9}. 
Corcuera and Giummole jj| showed that the above predictive distribution 0, which they 
called a generalized Bayesian predictive density, is optimal under the following a-divergence: 

D ( - a \p\\q) := { * ~ I p(x) L ^ 1 q(x) k ^ 1 dx 
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where a^±l. When a = ±1, it is defined by 

D(-V(p\\q) := Jp(x)\og(p(x)/q(x))dx, 
D( +1 \p\\q) := 

The a-divergence is closely related to the a-entropy of Renyi \^ and the Chernoff distance [f| 
in the classical information theory. Our purpose in the present paper is to extend the result 
obtained by Corcuera and Giummole to the quatum setting. 

In quantum statistics, which was initiated by Helstrom, Holevo, and other researchers [?], 
a quarter century ago, the optimal estimation of the parameter of the unknown quantum 
state has been one of the hot topics over the past several years with recent developments 
of experimental techniques 0, 10, till- They usually consider the ideal situation with all 
measurements (described by POVM) allowed and often deal with large sample cases. The 
theoretical limitation on the accuracy of the parameter estimation has been clarified to 
some extent. However, in practical situation, we often need to know the density operator 
describing the unknown quantum state rather than the unknown parameter with a given 

n 

measurement device. Buzek et al. |l2j recommended to use Bayesian technique especially 
when the sample size of experimental data is small. They proposed to use a posterior state 
corresponding to a posterior distribution in classical counterparts. Tanaka and Komaki [3] 
formulated the estimation of the unknown density operator as the quantum prediction prob- 
lem and showed that the posterior state is best among all the estimates of the unknown 
density operator. 

However, the optimality argument depends on the choice of an evaluation function, which 
is called a loss function in mathematical statistics. Here, we adopt a general class of loss 
fucntion as a quantum counterpart of the classical a-divergence (JHJ). Then, we define the 
generalized Bayesian predictive density operator and show that it is the optimal density 
estimate. Our result includes the previous result obtained by Tanaka and Komaki ;2] as a 
special case. 

In the next section, we briefly review our setting, essentially, the same one as in Tanaka 
and Komaki ^, except for the choice of a loss function. In Section 3, we prove our main 
result. Concluding remarks are described in Section 4. 
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II. PRELIMINARY 



We briefly summarize some notations of quantum measurement. Let H, be a separable 
(possibly infinite dimensional) Hilbert space of a quantum system. An Hermitian operator 
p on H is called a state or density operator if it satisfies, 

Trp =1, p > 0. 

We denote the set of all states on Ti as S (TC) . 

Let Q be a space of all possible outcomes of an experiment (e.g., Q = R n ) and suppose 
that a ex-algebra B := B(Q) of subsets of Q is given. An affine map p from 5(7"^ ) into a set of 
probability distributions on f2, P= {u(dx)| is called a measurement. There is a one-to-one 
C o rreS po n de n ce between a _ent aid a reS o,a fa of the ideatit y j^. A map from B 
into the set of positive Hermitian operators 

E:B^ E(B), 

where E satisfies 

E( ( f ) ) = 0,E(Q)=I, (4) 
E(VJ % B i ) = Y,E(B i ), Bi n Bj = 0, (5) 

i 

is called a positive operator valued measure (POVM). Any physical measurement can be 
represented by a POVM. 

Now we describe our setting of state estimation. Assume that a state pg on Ti is char- 
acterized by an unknown finite-dimensional parameter 9 e C R n . If dim Ti < oo, 9 may 
cover full range (often called the full model.). 

A quantum state for N systems, pW, is described on the A-fold tensor product Hilbert 
space Ti® N . Suppose that a system composed of N+M subsystems is given and that a 
measurement is performed only for selected N subsystems with the other M subsystems 
left. Then, the measurement is described by {E x <g> I}, where {E x } is a POVM on T-C® N and 
/ is the identity operator on H® M . 

Our aim is to estimate the true state o~q := pf M of the remaining M subsystems by 
using a measurement {E x } on the selected N subsystems pf N . We fix an arbitrarily chosen 
measurement. Note that the above measurement is not necessarily in the form of a tensor 
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product E® , which represents a repetition of the same measurement E x for each system. 
Thus, all possible measurements on iV subsystems, which may use entanglement, are con- 
sidered. We call <r(x) an estimate of the true state as a predictive density operator. The 
problem of the quantum prediction is to seek for the optimal predictive density operator. 

The performance of a predictive density operator a(x) is evaluated by the quantum a- 
divergence D^ a >(ag\\a(x)), a quantum analogue of the a-divergence (J3J) in classical statistics. 
The quantum a-divergence from p to a is defined by 

Z^)(p||a):= r ^(l-TraHy^), ifo^il. (6) 

When a = ±1, it is defined by 

£)(«=-!) (p||cr) : = TrpQogp - log a) =: D^= l \a\\p). 

It satisfies the positivity condition D^ a \p\ \a) > and D^ a \p\\a) = -v^ p = a. Other 
properties and useful inequalities, see, e.g., []j|. Thus, it can be used as a measure for the 
goodness of a predictive density operator. Note that the quantum a-divergence is reduced 
to the classical a-divergence when the two density operators are commutative. 



Remark. 1. 

The quantum «-di_ can be given a stable meaning as a „ only wben |„| < 3. 
(See, Hasegawa |14j|.) However, our statement formally holds for any a. We also assume 
additional conditions on density operators such that the a-divergence is finite. 
Remark. 2. 

The quantum a-divergence can be rewritten in the relative g-entropies H g (p\\a) := 
Tr p5 g / R p )(p^), where g in an operator convex function and g(l) = and 
L a (X) = aX } RJ X) = Xp are superoperators. The relative g-entropy was intro- 
duced by Petz jl5j |. 

If we assume a prior probability density tt(8) on the parameter space 0, the mixture state 
for the whole iV systems is given by 

pW := / dtf n(6) pf N . (7) 



A state of the form ((7|) is called an exchangeable state [la], and arises, e.g., if each subsystem 
is prepared in the same unknown way, as in quantum state tomography. In a quantum 



exchangeable model (J7J), as Schack et al. showed, a posterior distribution tt(9\x) naturally 
arises. We assume that the whole system is in the exchangeable state and in our setting 
n(9\x) is given by 

* m - "WW*) 



JdB p(x\6)ir{ey 
where p(x\0) = Trpf N E x . 

Finally, let us define generalized Bayesian predictive density operators. First of all, we 
consider an a-mixture of a$ with respect to a posterior density ir(9\x). 



{/ a e 2 7r(^|x)d^} — , a ^ 1, 
exp {f log(o"e)7r(6'|x)d6'} , a = 1. 



Clearly the above mixture is a positive operator and Tia^\x) > 0. Thus, we define the 
generalized Bayesian predictive density operator in the following normalized form. 

~° { *\*) := -^r-A^ix), CJx) := Tr^(x). 

L^ a {X) 

In the following section, we show that the generalized Bayesian predictive density operator 
is the best predictive density operator in the sense that it minimizes the averaged quantum 
a-divergence from the true density operator. 

III. MAIN THEOREM 

In classical statistics, Corcuera and Giummole showed that the generalized Bayesian 
predictive density p^\y\x) is the best predictive density under the a-divergence when 
a proper prior tt(8) is given. We derive the corresponding result for quantum predictive 
density operators. 



Theorem. 

Let a G R be fixed. Suppose that we perform a measurement for selected N subsystems 
pf N oi a system p®( Ar+M ) composed of iV + M subsystems in order to estimate the remaining 
M subsystems ag = pf M . The true parameter value 9 is unknown and a prior probability 
density n(9) is assumed. Let a(x) be any predictive density operator, where x is an outcome 
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of a measurement {E x } for the N subsystems. Performance of a predictive density operator 
a(x) is measured with the averaged quantum a-divergence 

EeE x [D {a \a e \\a)]= [ d9 tt(0) / dx p(x\e)D^(a e \\a(x)) 



from the true state a g . Then, the generalized Bayesian predictive density operator a ( f \x) 
is the best predictive density operator. 

Proof. 

When a 7^ ±1, let us consider taking an average of the difference between 
D^ a \ae\\a) — D <ya \a e \\a^). From now on, we omit a in the ai?\ 

E*E M *[D( a \ae\\a) - (<r e \\cr„)] 
= J d9n(9) J dxp(x\9)^-^Tra^ ' {dT - <7^)j 

/I" ( 4 i-a l+a 1+ ^) 

dx Px jden(e\x)\-—^ ^Tr^ 2 (^ 2 - <r^)| 

= J dxp xT ^Tr !^Jd9n(9\x)a^y^ -a 1 ^ 

/4 f 1 ~ Q! _ L+° l+a "1 

dxp x ^ _ TV |C a 2 (Tvr 2 ((Tvr 2 -<7~)j 

da^G* 2 — — {l - Tr^ 2 a— ) J 
= y d^C^/J^c^a) > 0, 

where := / d9'ir(0')p(x\9') is the marginal density of x. The last inequality holds due 
to the positivity of the quantum a-divergence D^ a \a\\a') > and p s > 0. Since &(x) is 
arbitrarily chosen, it is shown that a^\x) is better than any other a(x). We can repeat 
the same procedure for a — ±1. 
Q.E.D. 
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IV. REMARKS 



When a = 0, the quantum a-divergence is closely related to the fidelity F(p, a) := 
Tr|y/p-^/o"|, where \A\ := y A* A. Since \TiA\ < Tr|v4|, we obtain 

D®(p\ \a) = 4(1 - Ti^py^) < 4(1 - F(p, a)). 

The equality holds when p and a are commutative or both p and a are pure states. The 
fidelity is often used as a measure in the quantum information theory jl^ . How our theorem 
can be extended when we adopt the fidelity as a loss function is left for the future study. 
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